Goto

Collaborating Authors

 label model





Mitigating Source Bias for Fairer Weak Supervision

Neural Information Processing Systems

Theoretically, we show that it is possible for our approach to simultaneously improve both accuracy and fairness--in contrast to standard fairness approaches that suffer from tradeoffs. Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%.



DP-SSL: TowardsRobustSemi-supervisedLearning withAFewLabeledSamples

Neural Information Processing Systems

However, when the size of labeled data is very small (say a few labeled samples per class), SSL performs poorly and unstably, possibly due to the low qualityoflearnedpseudolabels.Inthispaper,weproposeanewSSLmethodcalled DP-SSL that adopts an innovative data programming (DP) scheme to generate probabilistic labels for unlabeled data. Different from existing DP methods that rely on human experts to provide initial labeling functions (LFs), we develop a multiple-choice learning (MCL) based approach to automatically generate LFs fromscratchinSSLstyle. Withthenoisylabelsproduced bytheLFs,wedesign a label model to resolve the conflict and overlap among the noisy labels, and finally infer probabilistic labels for unlabeled samples.





End-to-End Weak Supervision Carnegie Mellon University 2

Neural Information Processing Systems

Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources - making assumptions that rarely hold in practice - followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing prior probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.